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Abstract 

r—{ ' The dramatic increase of network infrastructure comes at the cost of rapidly increasing energy consumption, 

which makes optimization of energy efficiency (EE) an important topic. Since EE is often modeled as the ratio of 
rate to power, we present a mathematical framework called fractional programming that provides insight into this 
class of optimization problems, as well as algorithms for computing the solution. The main idea is that the objective 
function is transformed to a weighted sum of rate and power. A generic problem formulation for systems dissipating 
C/3 , transmit-independent circuit power in addition to transmit-dependent power is presented. We show that a broad class 

of EE maximization problems can be solved efficiently, provided the rate is a concave function of the transmit power. 
We elaborate examples of various system models including time-varying parallel channels. Rate functions with an 
arbitrary discrete modulation scheme are also treated. The examples considered lead to water-filling solutions, but 
these are different from the dual problems of power minimization under rate constraints and rate maximization under 



in 

^ ■ arbitrary discrete modulation scheme are also treated. The examples considered lead to water-filling solutions, but 

o" 

o\ 

^\ • power constraints, respectively, because the constraints need not be active. We also demonstrate that if the solution 

to a rate maximization problem is known, it can be utilized to reduce the EE problem into a one-dimensional convex 
problem. 

I. Introduction 

. Exponentially increasing data traffic and demand for ubiquitous access have triggered a dramatic expansion 

^ ■ of network infrastructure, which comes at the cost of rapidly increasing energy consumption and a considerable 

carbon footprint of the mobile communications industry. Therefore, increasing the energy efficiency (EE) in cellular 
networks has become an important and urgent task. Apart from this, EE plays an important role in other areas of 
wireless communications as well. For example, in multihop networks, EE is critical for prolonging the lifetime 
of the network fTl|. EE is also becoming increasingly important in mobile communication devices since battery 
capacity is unable to keep pace with increasing power dissipation of signal processing circuits 12j. 

A comprehensive survey of joint PHY and MAC layer techniques for improving wireless EE can be found in f3l. 
In an effort to integrate the fundamental issues related to EE in wireless networks, H presents four fundamental 
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EE trade-offs in detail. The paper at hand is concerned with the trade-off between spectral efficiency (SE) and EE. 
In particular, we look at practical transmission systems dissipating transmit-independent circuit power in addition 
to the transmit-dependent power. As described in Q, the Unk- level EE optimization problem in the active mode is 
closely related to two classical problems, one being rate maximization subject to a maximum power constraint and 
the other one being power minimization subject to a minimum rate constraint. Both problems lead to water-filling 
solutions, with the water level determined by the respective constraint. In comparison, EE optimization involves 
maximizing the amount of transmitted data per unit energy, or equivalently minimizing the energy consumption 
per bit. It turns out that the EE optimization problem also results in water-filling solutions, with a water level 
that depends on the transmit-independent power Results on energy-efficient link adaptation for frequency-selective 
channels are presented in |6|. A related efficiency objective function, which involves the packet success rate, has 
been treated in a game-theoretic setting utilizing pricing to achieve EE in Q. 

The contribution of this paper is a framework for solving EE maximization problems, which are different from 
the related problems of power minimization under rate constraints and rate maximization under power constraints, 
respectively. EE maximization belongs to a class of optimization problems called fractional programs. Since the 
fractional programming theory is not well-known in the wireless communications community, results that are 
presently scattered in the operations research literature are summarized in a coherent manner With this, we also 
show that the various approaches to the problem are mathematically connected through a scalarized bi-criterion 
optimization problem and provide an efficient solution algorithm. These results can be used to solve a large 
class of EE problems based on various system models. A series of applications ranging from time-invariant, flat- 
fading parallel channels to time-varying, flat-fading (single and parallel) channels illustrates the applicability of the 
developed framework. Results are shown to be applicable even for discrete modulation schemes. The algorithmic 
solutions have very low complexity because they are based on water-filling power allocation. In contrast to sum 
rate maximization or sum power minimization, however, the water level is not adjusted iteratively to satisfy the 
constraint with equality. Instead, the water level is used as a parameter that is adjusted until a certain criterion 
corresponding to the maximum EE is fulfilled. Finally, a direct reuse of standard rate maximization algorithms in 
a nested programming procedure, which is made possible using the framework, is discussed. 

The outline is as follows. A motivating example including the channel and power model is given in Section |ll] 
Section |III] lays out the mathematical framework for the paper. Both the maximization case (for maximizing the 
bit/J metric), and the minimization case (for minimizing the J/bit) are discussed. Incorporation of various empirical 
power dissipation models into a generic EE problem formulation is demonstrated in Section |IVl Based on this 
generic problem, results for different fading models (static and time-varying channels) and for practical modulation 
schemes are presented in Section [V] We further discuss how known rate maximization algorithms from the hterature 
can be adopted to EE optimization. Simulation results based on the models discussed are presented in Section |Vll 
The paper is wrapped up with a discussion about the water-filling solutions in Section IVIII followed by some 
conclusions in Section IVlIII 

Our notation is as follows. Column vectors are denoted by bold lowercase letters, e.g. x, with the ith com- 
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ponent denoted by Xi. Sets are denoted by calligraphic letters such as S. [x]^ denotes maxjO, a;}, [x]^ denotes 
min {y, max {z, x}}. A column vector of all ones is denoted by 1, and the component-wise sum of a vector x is 
denoted by l^x. 

II. Motivating example 

In order to motivate the development of a general framework, we provide an anecdotal example of EE maxi- 
mization. Consider a time-invariant Gaussian channel with K parallel quasi-static block flat-fading channels with 
coherence time Tc and gains 71, ...,7_r-. Perfect channel state information (CSI) is available at the transmitter as 
well as at the receiver. Each parallel channel occupies a bandwidth of Wc and elastic data is to be transmitted. 
Assuming Gaussian codebooks at the transmit side, the achievable data rate on channel k in bits per complex 
dimension is rk{pk) = log2(l + 'jkPk) with transmit power allocation per unit bandwidth pk > 0. The amount of 
information transmitted during a time-frequency chunk T^Wc is given by 

K 

We^log2(l + 7fcPfe) [bits] (1) 

In IHl, a power model for the nodes in a wireless network is proposed. The total power consumption in the 
active mode at the transmitter is modeled as Pont ~ PpA + ^ct, where PpA is the power dissipated in the power 
amplifier and Pet is the power dissipated in all other circuit blocks. The power dissipated in the power amplifier 
is given by PpA = ^Pti where ^ and ?/ are the power amplifier output backoff (OBO) and drain efficiency, 
respectively, and Pt = Wc X^fcLi Pk is the transmit power The OBO is needed to avoid the nonlinear region of the 
power amplifier and is determined by the peak-to-average power ratio (PAPR). The circuit power P^ is given by 
Pet = Pmix + Psyn + ^fiit + Pdac, whcrc the terms correspond to the power dissipation of the mixer, the frequency 
synthesizer, the active filters, and the digital-to-analog converter, respectively. The amount of energy consumed 
during one time-frequency chunk is 

Tc ■ (Pet + Ppa) = TcWc- I M + V Pk] [Joule], (2) 

where A* = f [W/Hz]. 

In a general sense, efficiency can be seen as the extent to which a resource, such as electricity, is used for the 
intended purpose. Efficiency is a measurable concept, quantitatively determined by the ratio of output to input. In 
the physical and medium access control layers, the output is the effective amount of data transmitted (measured in 
bits or nats) and the input is the total energy consumed for transmitting the data (in Joule). This results in the EE, 
defined as the amount of data transmitted ([T]) divided by the amount of energy consumed (O as 

1 '7EfcLllog(l + 7fcP/c) 1 V flip) ru-WT in 

EE = log2 e • - J. = log2 e • - — --- [bits/Joule]. (3) 

The EE in (O is usually maximized subject to constraints on the transmit powers pi,...,pK and the sum rate. 
Spectral mask constraints < Pk < Pmax are required by regulatory bodies. Sum power constraints are required 
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in order to limit interference in neighboring sectors. An additional sum rate constraint Rq can model the quality 
of service requirement of the traffic in the next block. Based on the resulting optimization problem is 

maximize fy^ with S = {p e -.y Pk < P,Pk < Pmax, fi (p) > i?o}, (4) 

PS'S J2(P) ^ 

where P is the maximum sum power Problem (|4]i belongs to a class of optimization problems called fractional pro- 
grams. As we will see later, many more examples of EE maximization problems in different wireless communication 
scenarios lead to fractional programs. Therefore, we study this class in more detail in the next section. 

III. Fractional Programming 
Fractional programs are nonlinear programs where the objective function is a ratio of two real-valued functions. 
For simplicity, only differentiable fractional programs, i.e. where both the numerator and the denominator are 
differentiable, are considered in this section. A general nonlinear fractional program has the form 

maxirnize q{x) = (5) 

where S C M", /i, /2 : 5 ^> M and f2{x) > 0. Problem (O is called a concave-convex fractional program if fi 
is concave, /2 is convex, and 5 is a convex set; additionally fi{x) > is required, unless /2 is affine. When /i 
and /2 are differentiable, the objective function in (|5]l is pseudoconcave ||9l, implying that any stationary point is a 
global maximum and that the Karush-Kuhn-Tucker (KKT) conditions are sufficient if a constraint qualification is 
fulfilled. Because of this, (|5]l can be solved directly by various convex programming algorithms Ig). However, when 
/i is concave and /2 is convex, the fractional program can be transformed to an equivalent convex program, which 
may be solved more efficiently in certain cases. In the literature, two different convex formulations and an approach 
based on duality have been suggested ifTOl . In the following, we will discuss each approach in some detail. As we 
will see, however, they are very closely related since they all lead to the same optimality condition. 

A. Parametric convex program 

Consider the following equivalent form ifTTl p. 134] of the fractional program ©I 

maximize A 

subject to 444 - A > 

Rearranging the constraint, we obtain 

maximize A 

subject to ,fi{x) — A/2 (a;) > 0. 

This formulation is not jointly convex in x and A, but for a fixed value of A we have a feasibiUty problem in x, 
which is convex if /i is concave and /2 is convex. The problem is feasible if 

max /i(a;) - A/2(a;) > 0. 

x£S 
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One can use bisection to find the optimal value of the parameter A, solving the feasibiUty problem at each step of 
the algorithm, as described in more detail in ifTTl pp. 145-146]. 
Consider the function 

F(A)=max Mx) - Xf2ix). (6) 

It can be shown that F{X) is convex, continuous and strictly decreasing in A fT2^. The right hand side of (|6]l can 
be viewed as a scalarized bi-criterion optimization problem in which fi{x) is to be maximized whereas f2{x) 
is to be minimized. The parameter A determines the relative weight of the denominator. If x* is optimal for the 
scalar problem, then it is Pareto-optimal for the bi-criterion optimization problem ifTTl pp. 178-184]. The set of 
Pareto optimal values for a bi-criterion problem is called the optimal trade-off curve. By varying the value of A, 
we explore the optimal trade-off curve between the objectives, as illustrated in Figure [T] The slope of the optimal 
trade-off curve at any point represents the local optimal trade-off between the two objectives. Where the slope is 
steep, small changes in /2 result in large changes in fi. The intersection of the curve with a vertical line [2 = a 
gives the maximum value of /i that achieves /2 < a. Similarly, the intersection with a horizontal line fi — (3 gives 
the minimum value of /2 that achieves /i > /3. 

Let q* be the optimum value of the objective function in (|5]l. The following statements are equivalent] ifTOl : 

F(A) > ^ A < q* 
F(A) = A q* 
F(A) <0<^X>q* 

Thus, solving problem Q is equivalent to finding the root of the nonlinear function F{X), so the condition for 
optimality is 

^^(A*)=max h{x) - X* f2{x) = 0. (7) 
xes 

Various iterative algorithms are available for finding the root of F{X) [13]. For example, the Dinkelbach method 
lfT2l in Algorithm [1] is based on the application of Newton's method. To see this, note that the update in Newton's 
method is calculated as 

Therefore, the sequence converges to the optimal point with a superlinear convergence rate. A detailed convergence 
analysis can be found in [14]. The initial point can be any Aq that satisfies F(Ao) > 0. It is also straightforward to 
include box constraints for fi{x) or f2{x). Referring to Figure[Tl a lower bound on /i or /2 corresponds to an upper 
bound on A, say A^ax, whereas an upper bound on /i or /2 corresponds to a lower bound Amin- Therefore, solving 
an optimization problem with this kind of inequality constraints reduces to solving the unconstrained problem and 

'in fact, these properties of -F(A) are true for more general nonlinear fractional programs (12). 
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determining whether A* falls within the interval [Amin, Amax]- If not, A* is replaced by the respective endpoint. 

B. Parameter-free convex program 

Let So E M" be a nonempty, convex, and open subset of the domain of the objective function q{x) that satisfies 
f2{x) > 0. Let S = {x E So\g{x) < 0} be the feasible subset of Sq with all convex inequality constraints 
g{x) < taken into account. 

The transformation 

f2{x) f2{x) 

yields the equivalent parameter-free problem f\5\ 

maximize tfi (y/t) 
y/teSo 

subject to tf2 {y/t) < 1 (9) 
tg {y/t) < 0, 

which is convex in (y, t) since taking the perspective of a function preserves convexity. The inequality in the first 
constraint can be changed to an equality if f2{x) is affine. Problem (|5]) has an optimal solution if and only if 
problem (|9j has one, and the solutions are related by ([8j. 

Let the dual variables associated with the constraints t/2 (y/t) — 1 < and tg {y/t) < be denoted by A and 
u, respectively. The Lagrangian is 

C{y, t, A, u) = -th {y/t) + A {tf2 {y/t) - 1) + {tg {y/t))^ u 

and the resulting stationarity conditions are 

-V/i(y*A*) + A*V/2(y*A*) + {^g{y* /t*))^u* = o 
-h{y*/t*) + yf2{y*/t*) + {g {y*/t*)f = 0. 

Due to complementary slackness, the last term in the second row is zero. The first row is the condition for the 
maximum of fi{y/t) — X* f2{y/t) subject to y/t E S with A* as parameter. Thus, the condition for the optimum 
is 

F(A*) = max h{y/t) - X*f2{y/t) = 0. (10) 
v/tes 

Comparing this to (jT), we see that the resulting optimality condition is equivalent to the one in the parametric 
approach. 
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(11) 



C. Dual program 

The Wolfe dual of the equivalent convex program (|9]l is (after substituting x for y/t) lfT6l 

minimize A 

subject to -~Vfi{x) + Wf2{x) + {Vg{x)Yu = 
-h{x) + \f2{x) + {g{x)fu>0 
a; e 5o,M e R!p,A > 0, 

which coincides ifTSi with the dual of the parametric convex program 

maximize fi{x) — Xf2{x), (12) 

where A G R is treated as a parameter. Thus, ( fTTT i is the dual of both convex programs. Note that the dual problem 
is not convex in general, since the equality constraint is typically not affine. 

Based on Wolfe's direct duality theorem we have the following result ifTSl : If x* is an optimal solution to problem 
(|5]l and S is nonempty, then there are u* and A* such that {x* , u* ,X*) is an optimal solution to the dual problem 

and q{x*) = A*. 

At the optimum, the inequality in the dual problem is satisfied with equality, i.e. —fi{x*) + \*f2{x*) + 
{g[x*))^u* — 0. Since A* ~ fi{x*)/f2{x*), due to complementary slackness we have {g[x*))^u* — 0. Thus, 
problem (fTTT i reduces to finding x* and the optimal Lagrange multiplier A* such that 

-Vh{x*) + \*Vf2{x*) + {Vg{x*))'^u =0 
-h{x*) + \*!2{x*) =0. 

The first equation is the condition for the maximum of fi{x) — X* f2{x) over x E S, with A* as parameter 
Summarizing, the condition for the optimum is 



^^(A*)=max fi{x) - X* f2{x) ^ 0. 



Again, this is equivalent to (jT). 



D. Convex fractional program 

Here we consider the equivalent convex-concave minimization problem with convex inequality constraints. In 
this case, we have 

mmimize % ) { , 

xes 

where S = {x E So\g{x) < 0} is bounded, and where gi{x) is convex and differentiable, fi{x) is nonnegative, 
concave, and f2{x) is positive, convex on S. 
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Consider the epigraph form of the convex fractional program: 

minimize A 
subject to - A < 

Rearranging the constraint, we obtain 

minimize A 

xes,xeR 

subject to f2{x) — Xfi{x) < 0. 

This formulation is not jointly convex, but for a given value of A we have a convex feasibility problem in x. The 
feasibility problem is solved by minimizing f2{x) — Xfi{x) and determining if the result is less than or equal to 
zero. Note further that the constraint must be active at the optimum, so we have 

min f2ix) - ~X* h{x) = 0. 

The dual problem is given by ifTTl 

maximize A 

subject to Vf2{x) - XVfi{x) + {Vg{x)Yu = 
h{x)^Xh{x) + {g{x)Yu>Q 
a; e 5,tt e R™,A > 0, 

which is analogous to (fTTT i. 

IV. Power models for base stations 

As described in Section |II] we are interested in maximizing the ratio of achievable rate to dissipated power, 
where the power consists of a transmit-independent part in addition to the total transmit power. We will concentrate 
on the generic optimization problem 

maximize q{p) = J^^J (13) 

where p is the transmit power spectral density, r{p) is a general concave rate (spectral efficiency in nat/s) function, 
and /i > is a constant offset, corresponding to the relative weight of the transmit-independent power The optimal 
value of the objective function decreases when /i increases, because fi corresponds to a shift to the right of the 
curve in Fig.[T] In this section, it will be demonstrated that various EE maximization problems resulting from power 
models in the literature can be transformed to the generic problem form ( fT3]) . While these power models are all 
hnear, the framework in this paper allows for arbitrary convex functions of transmit power. 

A. Generic base station power model 

In ifTSll . a generic model for the total power consumption of a base station Ptot is suggested, based on the 
assumptions 



June 8, 2012 



DRAFT 



9 



1) the total transmit power Pt is equally allocated to the Ua antennas at the transmitter, 

2) each antenna is associated with an RF chain, including a power amplifier, PpA, and other RF hardware, Pc, 

3) the power dissipation of each PA is considered proportional to the output power, PpA = Pt/na/riPA- 
The model is ^ 

PI 'IP A 
tot — -7-. T , 

VPS (1 - ric) 

where Psta is the static power consumption from baseband processing and battery unit, -qps is the efficiency of the 
power supply, and -qc is the efficiency loss in the cooling system. 
The EE metric [in bit/J] can be written as 

EE = ^ ' ^^^^ = g(p) 

C ■ 1]PA ■ +p) C ■ rjpA ' 

where B is the system bandwidth, p = Pt/B, fi — tjpa [riaPc + ^sta) /B. and C = r^ps (1 — ?7c) ■ 

B. Macro base station power model 

In |fT9l . the following power model for macro and micro base stations is presented: 

Pes = ^Sector • A^PApSec ' (— + ^Sp) ' (1 + Cc) ' (1 + CpSBs) 

The main parameters in the model for a macro basestation are summarized in Table H) 

With p = P-Yx/B, Psp- ^im/B, and C A^sector ■ A^PApSec • (1 + ^c) • (1 + Cpsbb) , the EE metric is 

6 • /ipA • B(pL+p) 6 • 
V. Applications 

In this section, we shall demonstrate how various channel models (flat fading and frequency-selective fading, 
static and time-variant), antenna configurations (including SISO and MIMO), and input constellations (Gaussian 
and quadratic M-QAM) result in concave rate functions that can all be treated within the mathematical framework 
developed thus far. 

A. Time-invariant parallel subchannels 

From Section HIl the problem to be solved is 

maximize q{p) = (14) 

I 7 |2 

where ri{pi) — log(l +^iPi). Here, 7^ = -i-^ is the channel-to-noise ratio (CNR) of subchannel i. Furthermore, 
we have box constraints for the individual powers, < Pi < Pmax. i — 1, . . . , Thus, the feasible set S is compact 
(closed and bounded) and convex. In order to illustrate the fractional programming theory, we shall solve problem 
(fT4] l using both the parametric and the parameter-free approach. 
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1) Parametric convex problem: The function F{\) is given by 



F{X) = max l^r(p) - \U + l^p). 

pes 



(15) 



The stationarity condition is 



Thus, we have 



drj 
dpi 



-A = 0, i = l,.. 



Pi=Pi 



X = 



1 + l.P* 

Taking the box constraints into account, the optimal power allocation is 



Prnax 



(16) 



The parameter A corresponds to a cutoff CNR. A subcarrier is not used if its CNR falls below the cutoff value 
(7i < A). The optimal power is therefore given by water- filling. 

The explicit solution in (fT6b is used in every iteration of any method that finds the root of F{X). One way of 
finding the root is to use the Dinkelbach method, as shown in Algorithmic 

Referring to Fig. [T] the vertical axis corresponds to the sum rate, whereas the horizontal axis corresponds to sum 
power plus an offset /i. A point on the trade-off curve corresponds to water-filling with a given water level 1/A. A 
point below the curve corresponds to a sub-optimal power distribution. The curve crosses the horizontal axis at /i 
and the optimal EE occurs where the tangent goes through the origin. When the offset increases, the optimal EE 
decreases, and it occurs for a higher sum power. 

2) Parameter-free convex problem: Remember that S ^ {x E So\g{x) < 0}, where So is the part of the 
domain of the objective function whose denominator is positive. In our case, the domain can be characterized as 
follows: The logarithmic function is only defined for the positive real domain, which implies pi > — l/7i, and the 
denominator cannot be zero, so l^p ^ — /i. The requirement that the denominator be positive excludes all vectors 
with a sum less than or equal to — ^. 

By the transformation 



y = 



, -.T P '^ 



t = 



1 T ' 
1 P 



peSo 



we obtain the convex problem 



maximize tl r{y/t) 

y/teSa 



subject to i/i + l^y ~ 1 

tg [y/t) < 0, 

where ri{yi/t) — log(l + 7i • Vi/t) and g is a vector of box constraints < yi/t < Pmax- Here, the variable t 
corresponds to the inverse of the total power dissipation. 



June 8, 2012 



DRAFT 



11 



After introduction of a Lagrange multiplier A G R for the equality constraint, the Lagrangian is 
£(y, t, A, u) = -tl^r{y/t) + X{tfi + l^y - 1) + (tg (y/t)f u. 



As the reader can verify, the KKT conditions yield 



t* 



A* 



and A* = t*i^ r{y* /t*). 

3) Adding constraints: As discussed previously, a maximum power constraint l^p < P corresponds to a lower 
bound Amin for A. Similarly, a minimum rate constraint > B corresponds to an upper bound Amax- As illustrated 
in Figure |2] these additional constraints lead to a penalty in EE. 

4) Flat fading channel: For the flat fading channel, the optimal power aUocation reduces to 



P*(A) = 



1 1 



Pmax 



(17) 



_A 7j 

For this simple channel model, it is in fact possible to derive the optimal value A* in closed form. Assume first 
that 7 > A*, so that p* > 0. Again, we wish to find the solution to the nonlinear equation F{X*) = 0, i.e. 



1 



1 

7 



After introduction of s = this can be transformed to 



The solution to this equation is 



(log s — 1) • s = /i7 — 1. 



where W is the Lambert W function |20|. Note that the condition p* > corresponds to s > 1, which implies 
W{e~^{fij — 1)) > —1, i.e. the principal branch Wq is selected. Thus, 



X* = 1 = 



7 



s cxp(l + iyo(e-i(M7-l)))' 
When there are no constraints on rate and power, there is always a feasible solution. 

Although the solution can be derived analytically for the flat-fading channel, it may still be attractive to use 
the Dinkelbach method for numerical evaluation, since evaluation of the Lambert W function also relies on a 
root-finding algorithm. 



B. Time-varying, flat-fading channel 

Here, we wish to maximize the average number of bits transmitted per unit energy consumed, calculated as the 
ergodic capacity divided by the average dissipated power. We assume causal CSI at the transmitter in an ideal case 
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with zero-delay feedback which requires no additional power The EE maximization problem can be stated as 

maximize q[p{-f)] ^ ^ / ' (1^) 

p(7)>o M + Jo P{l)l{l)dl 

where 7(7) is the probability density function (PDF) of the fading distribution. Note that optimization problem 
(fTsT l is concerned with finding an optimal function rather than a finite-dimensional vector as assumed in Section 
Hm However, the extension to optimization over functions is straightforward. The parametric convex optimization 
problem is 

maximize / log(l + ip{l))f{l)d-i - A ( ^ + / p{l)f{l)d-i ) , (19) 
p(7)>o Jo \ Jo J 

where A > is treated as a parameter 

Problem (fT9] l needs to be solved in each step of the Dinkelbach method. The stationarity condition (obtained by 
setting the functional derivative with respect to p equal to zero) is 

^ A = 0. 



1 + 7P*(7) 
Solving this equation for p* , we get 



* 1 1 

P = T - -• 

A 7 



1 1 

A^T 



The transmit power must be nonnegative, so the solution is 

and A corresponds to a cutoff CNR. Thus, we have 

^^(A) = log (^) /(7)d7 ~ ^ + (1 - ^) 



(20) 



The solution to F{X) = must be found numerically because no closed-form solutions exist for typical continuous 
distributions. However, evaluating F{X) numerically for any given A is straightforward. Therefore, the optimal value 
A* can be found iteratively. 

If the instantaneous CNR is below the cutoff level, the optimal strategy at the transmitter is to be idle. The idle 
probability is calculated as 

/>oo 

P(7<A*) = 1-/ /(7)d7- (21) 



1) Adding constraints: A maximum power constraint p < p^ax is equivalent to A > Anjin, where Amin satisfies 

/ (t^^ ) ./(7)C?7 =Pmax. 

Similarly, a minimum rate consti^aint r* > fmi„ is equivalent to A < Amax, where Amax satisfies 

log ( ) /(7)'^7 = ^min- 
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2} Example: Ray leigh fading: In Rayleigh fading, the PDF f{j) is 1211 

/(7) = (22) 

7 

_ _ 2 

The average CNR 7 is given by 7 = G • ^ , where G is the path gain from the transmitter to the receiver and a 
is the mean of the Rayleigh distributed variable. Substituting (|22] | into ( |20] | yields 

After the variable transformation x = 4; t — ^ we obtain 

7 



F(A) = ^ log t • xe-^*di 
where x and t are functions of A. Through integration by parts, we have 

/OO P OG "1 /'OO 1 

logt-a;e-"*dt = [logt- (-e-"*)]^ - (-e-"*) = j - ■ er^* dt = Ei{x) , 

where the generalized exponential integral En{x) is defined by 

/■OO 



Thus, 

'A\ , / 1 /„ /A\ _ /A 



where i;o(x) = /^°° e-^^dt = 

The idle probability for Rayleigh fading is given by 

f°° e'^h \* /A* A / A* A 

P(7 < A*) = 1 - / ^d7 = l-^.Eo{^)=l-cxp(-:^). 

C. Time-varying, parallel subchannels 

Suppose we have K parallel channels, as in the case of frequency-selective multicarrier systems. Additionally, 
the channels vary with time and the power allocation can be selected independently for every channel reaUzation 
7 = (71, ...,7if). We can characterize the power allocation as the vector function of the channel realization ^(7). 
As previously discussed, we want to maximize the EE q, which is quantified here as the ratio of the ergodic capacity 
to the average dissipated power, over vector function p(7). The maximization problem is then given as 

K 

/ Xl^°s(l + 7iPi(7))/(7)c?7 

maximize g [p (7)] = =^7^ ,,„,,, , (23) 

M + /^eRK P^ (7) f{l)dl 
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where /{-y) = /(71, ...,7^) is the joint PDF of the K subchannel CNRs. The corresponding parametric concave 
optimization problem with parameter A is 



maximize 



ize / V log (1 + (7)) fi7)dl - A U + / (7) fi7)dl\ , (24) 



which has to be solved at each step of the Dinkelbach method. Since the maximand of ( |24] | is a concave functional 
of p (7), the KKT conditions are sufficient for optimality. These conditions yield the optimal function 



K(A,7) = 



1 1 

A 7j 



1,...,K. (25) 



Note that p* is an explicit expression of the component ji only and not of the vector 7. 

We now obtain the solution to ( |23] | by finding A* using (|25] |. This is done by computing the root of the function 

FW^ [ E (1 + 7.)) /(7)rf7 - A I + / E /(7)rf7 ) (26) 

using the Dinkelbach method. The computation of the integrals may be demanding, especially for K > 3. However, 
the computation time can be reduced by exploiting the structure of /(7), e.g. if the parallel subchannels are 
independent, /(7) can be written as a product of the PDFs of its components 7,. 

Analogously to Section IV-Bll average sum power and sum rate constraints can be easily imposed here as well. 
Moreover, this method can be applied to MIMO channels, which are decomposed into parallel channels using 
singular- value decomposition ||221 . The case of Rayleigh fading channels has been treated in 



D. Gap to capacity 

The Shannon capacity models the theoretically achievable rate for an ideal Gaussian input. In a real system, 
the achievable rate is often modeled using a gap depending on the modulation and coding schemes being used. In 
addition, a gap can be used to model the uncertainty in the received SNR. 

7 ) Constant gap to capacity: The simplest variation of the rate function is to introduce a constant gap to capacity, 
as suggested in [6 |. The rate function then becomes 



/ 7i 
,{p,) = log + — -p, 



where F is the gap to capacity. Note that F is independent of the subcarrier CNR. The simplest way of including 
such a gap is to exchange 7^ for ^ in the water-filling solution. 

2) Subchannel-dependent gaps (mercury/water-filling): For an arbitrary modulation scheme, the rate function is 
described by the mutual information expression. In the following, the approach is described for parallel channels 
following II24I . It can be generalized to multiple antenna systems ||251 . 

The input signals on the i-th channel Si (normalized with unit power) are from some modulation set A4i, which 
can be discrete as well as continuous. The rate function is defined as the mutual information between input and 
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output of the channel, 

n{pi) = I{si; ^^tViSi + rii), [nats/channel use] (27) 

where 7^ — \hi\^ ja^, rii is a zero-mean unit-variance proper complex Gaussian random variable and pi is the power 
allocated to the i-th channel. The mutual information in jZTl ) is strictly concave in p Il24l Appendix A]. In general, 
it is difficult to obtain a closed form expression for the mutual information. However, all optimization problems in 
the last section can be generalized by the following observation f26\: If the signal-to-noise ratio on the ith channel 
is denoted by pi = 'jiPi, then 

-^r,(p,) =MMSE,(pO, (28) 

where MMSEi{pi) = E^JIsi - s^p] with MMSE estimate s,; = E^, [si|^p7si + n,; = y^]. The MMSE is known 
in closed form for many important discrete and continuous constellations ll24l Section IV] and these expressions 
can be inserted into the KKT optimality conditions. In order to solve for the optimal power allocation, the inverse 
MMSE function MMSE~^{p,) is used. 
The parametric convex program is 

F(X) = max l^r(p) - Af;* + l^p), 
pes 



where — ri{pi) according to (l27T i and A G K is treated as parameter. The stationarity condition is 

-A = 0, i = l,...,K. 



drj 
dpi 



Pi=Pi 



Inserting ( |28] l. we have 

7iMMSE,(7,p*) = A, i = 



i.e. the MMSE of subchannel i at the optimum power p* is given by 

MMSE,(7,p*) = -. 

li 

Considering the constraints pi > 0, the optimum powers are given explicitly by 

i-MMSE-i (G) C: < 1 
Q>1 

where Q = X/ji. This solution has a graphical interpretation analogous to conventional water-filling ll24ll with 
exchanged for where 




r.(C.) 

' l/C.-MMSE-i(C.) C.<1 

1 G > 1 



r.(C.) 



is the gap with respect to an ideal Gaussian signal. For Gaussian inputs, F, = 1. 

The A that maximizes the EE is obtained by finding the root of F{X). The rate functions are computed through 
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integration of the MMSE over p 



n{p,) = /" MMSE, (p) dp. 
Jo 



As already mentioned, the MMSE can be evaluated for discrete constellations in a semi-analytical form involving 
some simple integrals. For a real-time implementation the values of MMSE^^(-) and ri(-) can be tabulated for the 
constellations of interest. The function F{X) is then evaluated as follows: 

1) Calculate Q for all subcarriers 

2) Use the table of MMSE"^( ) to find p* for all subcarriers 

3) Use the table of ri{-) to find ri{jip*) for all subcaiTiers 

4) Use r.i(7ip*) and p* to calculate F{X) 

E. Nested convex problem 

Many solutions (whether closed-form or algorithmic) to maximization of rate functions given a sum power 
constraint in various scenarios are available in the literature. A well-known example of this is rate maximization 
over parallel channels. The solution is water-filling, where the water level is a function of the dual variable, which 
can be computed using known algorithms 1271 . An EE optimization problem can be reduced to a one-dimensional 
convex problem using transformation dHJ, which allows the known results to be utilized. We will illustrate this using 
the example of mercury/water-filling. 

For any optimization problem, we can first optimize over some of the variables and then over the remaining ones 
lim Sec. 4.1.3, p. 133]. Thus, ^ can be reformulated as 

maximize tfi{y* {t)/t), (29) 

where 

y*{t) = argmax{/i(yA) : i/a (?//<) <l,y/teS}^ tx*{t) 

V 

= t argmax{/i(a;) : /a (a?) <l/t,xeS}. (30) 

X 

Since the original problem is convex, the new problem is convex as well. 

As shown in [24], the optimal power allocation for the maximization of the sum rate (or mutual information) 
over parallel channels for an arbitrary modulation scheme, i.e. 

K 

p* = argmax Y]ri(pj), 



where ri{pi) is given by (|27] |. is 



p-.lMMSEr>(.„„{l.^}). 



i = l,...,K, 
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where rj is the unique solution to the equation 




(31) 



Let us denote the maximum rate function by ^ilK (^))' which is evaluated algorithmically for any given 

sum power P > 0. Now we want to solve the problem 



where t = + P)"^ 

Note that the optimal power allocation for EE maximization is functionally identical to that of rate maximization. 
The difference between them is that rj is chosen to fulfill the sum power constraint in the former, whereas 77 is 
chosen to achieve the highest EE in the latter. 

A similar nesting approach was proposed in |28|, where the EE problem with any concave rate function is 
reduced to a one-dimensional quasiconvex problem. Here it is formulated as a one-dimensional convex problem. 

This approach has the advantage that known rate maximization results can be easily implemented with almost no 
analysis required for maximizing the EE. However, doing some pre-analysis of the original EE optimization problem 
enables it to be solved with less computational cost. In solving (|32] |. every iteration for finding the optimal t requires 
solving (ISTT l to obtain p*{l/t — ji). In the approach presented in Section rV-D2l however, no inner optimization is 
required because p* is derived explicitly as a function of A. Thus, the optimization can be carried out directly over 
the dual variable A and the maximum EE is obtained more efficiently. On the other hand, if such a pre-analysis 
cannot be done, or if the computation time is not an essential criterion, the nesting method may be attractive. 



A. Time-varying channel with varying number of antennas 

Let us consider a time-varying frequency-flat MIMO link with ut and nn transmit and receive antennas, 
respectively, where the link between each transmitter and receiver antenna is subject to Rayleigh fading. We assume 
that perfect causal channel information is available at both ends. As previously mentioned, this can be transformed 
to parallel channels using singular-value decomposition. Using the result from Section IV-CI and the generic base 
station power model in Section |TV-A| we optimize the EE over the transmit power for various antenna configurations 
and observe how the optimal EE changes with the circuit power Pc- The bandwidth is set at i? 200 kHz, and 
the noise power density at A^o = —104.5 dBm/Hz. We assume the power amplifier efficiency to be ?7pa — 0.35. 
The other constants in the power model are chosen according to values presented in [I8|: rjc = 0.95, ryps = 0.9, 




with /i > 0. Applying ( |29] l and the known solution p*, we obtain 



K 



maximize 

0<t<l//i 




(32) 



VI. Simulation 
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In Fig. |3] we observe that for an equal number of antennas (ut = rifj = n) on both ends, it is more efficient to 
employ more antennas in this setting. Notice also that EE* decreases monotonically with Pc- This is in agreement 
with results in |29|, although there the antenna configuration is considered to be energy-efficient if it yields a 
small energy-per-goodbit given a maximum tolerated outage probability. It is shown there that for Rayleigh fading, 
selecting the balanced MIMO configuration with the highest n gives the best EE, but this is not the case for Rician 
fading. Due to higher correlation between the transmit and receive antennas in Rician fading, lower rates are achieved 
and therefore the employment of more antennas (which incur higher circuit power consumption) deteriorates the 



It is also interesting to note that if Pc = 0, i.e. if the circuit power does not depend on the number of antennas, 
EE* increases linearly with n. 

In Fig.m we simulate the case where the receiver has only one antenna. Again, EE* decreases with Pc- However, 
it is not always best to choose the largest number of transmit antennas. As can be seen in the inset, employing 
the highest tit is efficient only if Pc is small. This is intuitive since when Pc is small, it does not cost much 
more power to employ more antennas. As Pc increases, the loss in EE by employing more antennas increases as 
well. The reason for this is that when nj^ = 1 and Pc is nonzero, the transmission rate scales sublinearly with n-r, 
whereas the power consumption scales linearly with it. As Pc becomes larger, the difference between the gain in 
EE (through the increase of the transmission rate by increasing tit) and the loss caused by the more rapid increase 
in power consumption becomes larger as well. 

The overall conclusion from the assessment in Figures [3] and |4] is that one should carefully consider whether or 
not to activate each antenna with the required RF chain. As a rule of thumb it holds: activate additional antennas 
at the transmitter and receiver side only if it is worth it. Contrary to the traditional point of view, having more 
antennas is not always better An additional diversity gain (Fig. HI does not always justify the additional energy 
consumption; it depends on the operating point. In contrast the additional degree of freedom or multiplexing gain 
in Fig. [3] motivates the activation of more antennas. 

B. Quadratic m-QAM 

In the presence of Gaussian noise, the MMSE for an m-ary discrete constellation is 



For even m, the corresponding m-QAM consists of two m/2-PAM constellations in quadrature, each with half the 
power. Writing y as yi +jyQ, it can be shown that integration over the quadrature component yg yields y/n. Thus, 



EE. 




2 



where qi are probabilities and the integral is over the complex field. 
For TO-PAM, we have qi ~ 1 /ni and 
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for m-PAM we have 



MMSE{p) = 1 - ^ 



— OO 



>oo 



dyi. 



The values of MMSE(p) are evaluated numerically for various m-QAM constellations. Using this result, MMSE^^( ) 
and ri{-) are tabulated. The EE of a flat fading channel is optimized according to the method detailed in Section 
IV-D2I The resulting trade-off curve with /i = 1 is shown in Figure |5] If /i is independent of the modulation scheme, 
it is always beneficial to use a higher modulation order since there is no cost associated with using a higher order 
modulation scheme. For small values of /i, the curves start at a point close to the origin and the optimal EE is 
approximately equal for the different schemes, whereas the difference increases for larger values of /i. The value 
of p* is also higher for higher-order modulation schemes. 

However, a higher modulation scheme may increase the necessary offset power. In this case, a lower modulation 
order might be optimal in certain cases. 



The variable A is found throughout the solutions in the application examples. We would like to point out its 
significance by recapitulating its various interpretations. In Section UlI-AI we showed that A represents the relative 
weight of the denominator in the scalarized bi-criterion optimization problem. It can also be interpreted as the slope 
of the trade-off curve between two objectives. In EE optimization, these two objectives are the sum rate and the 
sum power. At the optimum. A* is identical to the maximum EE adjusted with an appropriate system-dependent 
scaling factor. 

All the examples we considered resulted in water-filling solutions. It is noteworthy that A in these cases represents 
a cut-off value, i.e. power is allocated for transmission through a channel only if the SNR value 7 is larger than A. 



There exist many results on EE optimization in wireless communications systems. Most papers formulate a novel 
objective function and solve the corresponding optimization problem under certain constraints and assumptions for 
a specific scenario. We feel that it is time to unify the various approaches and understand the core of this class of 
problems. In this paper, motivated by a typical anecdotal scenario we arrive at a non-convex optimization problem 
of maximizing the ratio of achieved rate to dissipated power. It belongs to a class of problems called fractional 
programs, for which a rich but scattered mathematical Uterature has evolved over the years. Therefore, we collect 
and coherently present the results and offer a set of solution methods. The power models are carefully described in 
order to motivate the problem formulation. Applications in various settings include time-invariant parallel channels, 
time-varying flat-fading channels, and time-varying parallel channels, illustrating the usefulness of the framework. 
As an extension to this framework, one could study the case where more general function classes, e.g. non-concave 
functions, are used in the numerator of the EE metric. A framework that accommodates discrete optimization 
variables would also be interesting for systems with on-off power modes, in which parts of a base station may be 



VII. Discussion 



VIII. Conclusions 
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turned off during off-peak hours. For these problems, other optimization methods will be needed in addition to 
concave fractional programming. 
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Fig. 1. Illustration of the trade-off curve between fi{x*) and f2{x*), where x* is optimal for a given value of A. The parameter A is the 
slope of the tangent, whereas -F(A) is given by the intersection with the vertical axis. The corresponding value of the objective function in (3) 
is given by tan 9. The maximum occurs where F{X) = 0. 
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Algorithm 1: The Dinkelbach method. 
Data: Aq satisfying F(Ao) > 0, tolerance A 

n = ; 

while |-F(A„)| > A do 

Use A = A„ in (|6l) to obtain a;* ; 



n + +; 

end 
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Algorithm 2: The Dinkelbach method for energy-efficient link adaptation on a block fading, frequency-selective 
channel as modeled by optimization problem ( fT4l l. 

Data: Aq satisfying F(Ao) > 0, tolerance A 
n = 0; 

whUe |F(A„)| > A do 
Calculate p* from ( fT6] l: 

n + +; 

end 
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Fig. 2. Plot of the optimal EE as a function of /i for a frequency-selective channel. When is small, A = An,ax due to the sum rate constraint 
and the problem reduces to pure power minimization. Similarly, when fi is large, A = An,i„ due to the maximum sum power constraint and the 
problem reduces to pure rate maximization. In both cases, there is a penalty in EE outside the interval. 
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Fig. 4. The maximum EE in time-varying MIMO channels with Rayleigh fading versus circuit power. The number of receive antennas is one. 
The inset shows the enlarged region where Pc 6 [0, 1.5]. 
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Fig. 5. Trade-off curve between transmit power and mutual information for Gaussian signals and m-QAM, respectively, in a single-carrier 
system with /i = 1. The dotted lines indicate the maximum EE. 
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TABLE I 

Linear power model parameters 



Parameter 


Description 


-^'sector 


# sectors 


Ptx 


Tx power 


^SP 


Signal processing overhead 


C'PSBB 


Battery backup and power supply loss 


-^PApSec 


# PAs per sector 


Mpa 


PA efficiency 


Cc 


Cooling loss 
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